Genomic epidemiology of SARS- CoV-2 Omicron variants in the Republic of Korea

The outbreak of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has caused a global pandemic since 2019. Variants of concern (VOCs) declared by the World Health Organization require continuous monitoring because of their possible changes in transmissibility, virulence, and antigenicity. The Omicron variant, a VOC, has become the dominant variant worldwide since November 2021. In the Republic of Korea (South Korea), the number of confirmed cases increased rapidly after the detection of Omicron VOC on November 24, 2021. In this study, we estimated the underlying epidemiological processes of Omicron VOC in South Korea using time-scaled phylodynamic analysis. Three distinct phylogenetic subgroups (Kor-O1, Kor-O2, and Kor-O3) were detected in South Korea. The Kor-O1 subgroup circulated in the Daegu region, whereas Kor-O2 and Kor-O3 circulated in Incheon and Jeollanam-do, respectively. The viral population size and case number of the Kor-O1 subgroup increased more rapidly than those of the other subgroups, indicating the rapid spread of the virus. The results indicated the multiple introductions of Omicron sub-lineages into South Korea and their subsequent co-circulation. The evolution and transmission of SARS-CoV-2 should be continuously monitored, and control strategies need to be improved to control the multiple variants.


Results
Maximum-likelihood phylogenetic tree reveals three different virus groups transmitted in South Korea. A total of 582 sequences of the Omicron variant were obtained from November 24, 2021, to January 11, 2022. The proportion of the Omicron variant rapidly increased during this period and replaced the previously dominant Delta variant in South Korea. Of the various sub-lineages, the BA.1 sub-lineage was dominant (Fig. 1a).
We constructed approximately maximum-likelihood phylogenetic trees of BA.1 sub-lineages and detected 128 distinct monophyletic subgroups sharing a common ancestral node with other Korean viruses, with a local support value of > 0.7. Most subgroups caused only single or short local outbreaks; however, three subgroups-Kor-O1 (n = 273), Kor-O2 (n = 68), and Kor-O3 (n = 86)-caused multiple local outbreaks in South Korea (Fig. 1b). In the Kor-O1 subgroup, two inbound travelers were detected, which included imported cases from Ivory Coast on December 30, 2021, and Congo on December 29, 2021; however, no inbound travelers were detected in the Kor-O2 or Kor-O3 subgroups (Table S1). The Kor-O1 subgroup also included 35 sequences detected outside South Korea, indicating the possible spread of viruses from South Korea to other countries.
The Kor-O1 and Kor-O3 subgroups belong to the BA.1.1 sub-lineage of Omicron, and the Kor-O2 subgroup belongs to BA.1. Korean Omicron sub-lineages have additional mutations from the original Omicron strain, hCoV-19/South Africa/NICD-N20868/2021, collected on November 11, 2021. All three subgroups have T5730C in ORF 1a and T462A in the ORF3a gene. The Kor-O1 subgroup mutated A603T and G604C in the N gene, Kor-O2 mutated G955A in the N gene, and The Kor-O3 subgroup mutated C2470T in the ORF 1a gene and C473T in the M gene (Fig. S1).
Genomic epidemiology reveals that these three subgroups have different tendencies for geographic transmission. To investigate the time to most recent common ancestor (tMRCA), mutation rates, and transmission trends, we constructed a time-scaled phylogenetic tree. The tMRCA results showed that all three Korean subgroups were introduced into South Korea between mid-November and mid-December, and Kor-O2 was estimated to be the first subgroup. For the Kor-O1 and Kor-O3 subgroups, the difference between the first isolation date and the mean of tMRCA was 9 days, and for Kor-O2, it was 12 days. The mutation rates for all three subgroups were between 1.0 × 10 −3 and 1.3 × 10 −3 substitutions/site/year and did not show significant differences between the subgroups (Table 1).
To estimate the transmission dynamics of the Omicron variant circulating in South Korea, we reconstructed ancestral locations and inferred migration of viruses for three Korean subgroups (Kor-O1, Kor-O2, and Kor-O3). Our results showed that the three subgroups had different geographic transmission tendencies. The Kor-O1 subgroup spread to 13 out of 17 states. On December 17, 2021, the Kor-O1 subgroup was first detected in Gyeonggi-do, and it was the major site of the outbreak until December 24, 2021. On December 31, 2021, it was primarily spread in Daegu, with an increasing rate of transmission from Daegu and Gyeongsangbuk-do to other regions (Fig. 2, Table 2, and Video S1).
In the case of the Kor-O2 subgroup, the virus was primarily detected in three states: Incheon, Gangwon-do, and Busan. Most transmissions started from Incheon to Gangwon-do before mid-December, located in the northern region of South Korea. However, after mid-December, transmission mainly proceeded in Gangwon-do, and nearly all transmissions were directed to Busan, the southern region of South Korea (Fig. 3, Table 3, Video S2).
Finally, in the case of Kor-O3, we detected that the transmission started in Jeollabuk-do by the end of December. Most transmissions were observed in Jeollanam-do, Jeollabuk-do, and Gwangju, which are in the western region of South Korea. We observed that Kor-O3 in Jeollabuk-do spread mainly to Jeollanam-do, Seoul, and Daegu and that in Jeollanam-do spread mainly to Gyeongsangbuk-do, Gwangju, and Incheon. From the beginning of January 2022, many transmissions were observed from Gyeongsangbuk-do to Gyeonggi-do and Daejeon (Fig. 4, Table 4, Video S3).
Population size dynamic analysis showed that the Kor-O2 subgroup was discovered first, followed by Kor-O3 and Kor-O1. Whereas the population size of the Kor-O2 subgroup was constant, the population size of the Kor-O1 and Kor-O3 subgroups continuously increased. Furthermore, the Kor-O2 subgroup was not detected after the last case detected on January 5, 2022 (Fig. 5).

Discussion
The mutations of the Omicron variant were concentrated in the spike protein, including four deletions and one insertion (Δ69-70, Δ143-145, Δ211-212, ins214EPE) 18,19 . Mutations in the spike protein can enhance the ability of the variant to evade current vaccines 16,20 . In addition, mutations in the receptor binding domain could enhance receptor binding ability, which could affect virus transmissibility 21,22 . The Omicron variant replaced the previous dominant lineage, Delta, and has become the dominant variant worldwide.
Omicron is classified into several sub-lineages (BA.1-5) based on its mutations in PANGOLIN (https:// pango lin. cog-uk. io/) 23 . In addition, several recombinant lineages between Delta and Omicron or between BA.1 and BA.2 have been continuously reported 24,25 . The recombinant lineages were not detected in South Korea in this study because we analyzed data collected in the very early stages of the Omicron outbreak. Although BA.2 sequences were also detected, most cases in the early stages of the Omicron outbreak in South Korea belong to BA.1. Therefore, this study focused on BA.1 and its sub-lineage BA. 1  Our study highlights the three Omicron subgroups that co-circulated in South Korea in different regions. These semi-separated circulations might have contributed to the continuous outbreak of each subgroup. As shown in the spread of the Alpha, Delta, and Omicron VOCs, a higher transmissible variant could replace the prevailing strain 26 . Here, the KOR-O2 subgroup was not detected at the end of this study. However, the Table 1. First detection date, mutation rates, and time to most recent common ancestor of each Korean Omicron subgroup. 1 Time to most recent common ancestor. 2 95% HPD, 95% height posterior density. 3 Substitutions/site/year.        www.nature.com/scientificreports/ co-circulation of multiple variants should be carefully monitored. The co-circulation of different variants could be an obstacle to disease control using vaccines and therapeutics. More extended monitoring and continued genomic epidemiology will be required to identify circulating variants. The Kor-O1 subgroup caused the largest outbreak and spread to most of the regions in South Korea. While the spread of Kor-O2 and Kor-O3 started in Incheon and Jeollabuk-do, respectively, the Kor-O1 subgroup started in Gyeonggi-do. Seoul and Gyeonggi-do are the major metropolitan areas in South Korea and have the largest population and movement of population. Outbreaks in metropolitan regions may contribute to the more frequent and rapid spread of Kor-O1 than that of other subgroups. During our investigation period, the vaccination rate in South Korea was approximately 80%; nevertheless, the Omicron variant rapidly spread and caused a large outbreak. Vaccination could not completely prevent SARS-CoV-2 infection, but the effect of vaccination on the severity and transmission of the disease should be carefully evaluated by further studies.
The ancestral reconstruction and phylogeographic analysis used in this study can be affected by sampling bias 27 . We selected viruses for sequencing based on field epidemiological investigation results to minimize the impact of sampling bias. In addition, we did not conduct down-sampling and used all available sequences to avoid other sampling biases. However, the potential role of unsampled cases and less sampled populations cannot be entirely excluded. Owing to the large outbreak after the introduction of Omicron, sequence analysis was conducted for only approximately 3% of confirmed cases. The possible existence of undetected sub-clinical   www.nature.com/scientificreports/ infections also could not be analyzed. Such sampling biases and low sequencing coverage can affect the results of the phylogeographic analysis. However, this study presents the most probable transmission dynamics of the Omicron variant in South Korea by reconstructing ancestors in phylogenetic analysis, using all of the available sequence data. As the number of Omicron cases in South Korea increases, it is essential to estimate the spread of variants using molecular epidemiology. However, the early detection of novel variants among the numerous cases can be limited, owing to the restriction of sequencing coverage. We estimate the time of introduction of the virus based on a time-scaled phylogenetic tree, and the results show that South Korea's current sequence surveillance system could detect new viral introductions within 9 to 12 days.
Omicron variant sub-lineages continue to be detected worldwide, and viruses are evolving into various sublineages. As SARS-CoV-2 is becoming endemic in most countries, the Korean government took steps to relax its quarantine policy and reduce the level of social distancing for a gradual return to normal life. Although the number of confirmed cases is gradually decreasing, novel variants of SARS-CoV-2 continuously threaten public health. Because multiple introductions and co-circulation of variants were detected in this study, the relaxed quarantine policy will require improvements in the variant monitoring system and control strategies.

Methods
Virus detection and whole-genome sequencing. Nasopharyngeal and oropharyngeal swabs and sputum samples were collected from symptomatic patients to detect SARS-CoV-2 by real-time reverse transcriptase (RT)-PCR. According to the manufacturer's protocol, RNA was extracted from 140 μL of the sample using a Qiagen viral RNA mini kit (Qiagen, Hilden, Germany). Real-time RT-PCR was performed on the extracted RNA and cycle threshold 28 .
The samples were chosen for sequencing from the outbreak data based on epidemiological links. In addition, we selected samples from sporadic cases and a few random representative samples from epidemiologically linked large outbreaks. Whole-genome sequences of selected viruses were analyzed using the ARTIC protocol (https:// artic. netwo rk/ ncov-2019). Libraries were prepared using the Nextera DNA Flex Library Prep Kit (Illumina, USA), and sequencing was performed by the MiSeq instrument using the MiSeq Reagent Kit V2 (Illumina, USA).
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institution and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards. The study was approved by the Institutional Review Board of the Korea Disease Control and Prevention Agency (approval number: 2020-03-01-P-A) and was designated a service to public health during the pandemic. Therefore, the Institutional Review Board waived the requirement for written informed consent, as outlined in the Title Laboratory Respondence to COVID-19.
Identify phylogenetic subgroups. BA.1 and BA.1.1 sub-lineages of Omicron full-genome sequences, identified as of January 31, 2022, in other countries, with > 29,000 nt and < 1% undefined bases, were downloaded from the GISAID EpiCoV™ database (https:// www. gisaid. org/) for reference sequences (n = 7053). For efficient computation and analysis, we selected 3330 sequences (1543 from Europe, 951 from North America, 543 from Asia, 212 from Oceania, 54 from Africa, and 27 from South America) based on the nucleotide sequence identified at the 99.97% level using the program CD-HIT 29 . We used Nextclade v1.13.2 (https:// clades. nexts train. org/) for quality control (QC) of 582 sequences isolated in South Korea. Nextclade uses an algorithm that considers missing data, mixed sites, private mutations, mutation clusters, stop codons, and frameshifts in calculating a score. A total of 32 sequences that exceeded the QC score of 100 were excluded, resulting in 550 sequences remaining.
Using Geneious Prime software (https:// www. genei ous. com/), each sequence was aligned to the reference sequence Wuhan-Hu-1 (GenBank ID MN908947). Aligned sequences were trimmed to equal lengths (29,409 bp) from the start codon of ORF1ab to the stop codon of ORF10. Frame shifting insertions in < 1% of sequences were considered erroneous sequencing results and were deleted through mask alignment in the Geneious software. The approximately maximum-likelihood phylogenetic tree was constructed using FastTree in Galaxy Europe (http:// usega laxy. eu/) under the general time reversible (GTR) model for the effective computation of big genomic data. The phylogenetic tree was visualized and annotated using iTOL (https:// itol. embl. de/). In the FastTree result, we divided Korean sequences into three subgroups with the monophyletic cluster sharing a common ancestral node with other Korean sequences showing a local support value of > 0.7. This resulted in a Kor-O1 subgroup with 273 sequences, a Kor-O2 subgroup with 68 sequences, and a Kor-O3 subgroup with 86 sequences.
To investigate the mutations of each subgroup of Korean sequences, we compared the Korean sequences with the first isolated Omicron, hCoV-19/South Africa/NICD-N20868/2021, collected on November 11, 2021, as a reference sequence.
Phylogeography and population size dynamics of each subgroup. For each Korean sub-lineage, we reconstructed the maximum-likelihood phylogenetic tree using Geneious Prime software, and the temporal signal of the dataset was analyzed using the TempEst program (https:// beast. commu nity/ tempe st) 30 . Outlier sequences in the root-to-tip regression were excluded. In addition, we deleted 2 sequences of inbound travelers and 35 reference foreign sequences from the data to focus on the geographical relationships in virus transmission in South Korea. The final data set of the Kor-O1 subgroup with 231 sequences, the Kor-O2 subgroup with 67 sequences, and the Kor-O3 subgroup with 86 sequences were used for time-scaled phylogeographic analysis. www.nature.com/scientificreports/ To estimate the transmission and introduction of Omicron in South Korea, we constructed a time-scaled phylogenetic tree for three subgroups (Kor-O1-Kor-O3) using BEAST v.1.10.4 31 . The sequences of each subgroup were coded as 17 states and other geographic locations by the administrative district in South Korea. Posterior phylogenetic tree distribution was estimated using Bayesian phylogenetic inference. The general time reversible model, along with a gamma-distributed rates (GTR + γ) nucleotide substitution model was selected. To estimate the viral population size change, the unweighted pair group method with an arithmetic starting tree model and Bayesian Skygrid coalescent tree prior was used with an uncorrelated relaxed clock model. For each subgroup, Markov-chain Monte Carlo was run in parallel for four chains, each with 150 million steps. The parameters and trees were sampled every 10,000 steps, yielding a total of 60,000 parameter states and posterior trees. The parameters were analyzed using TRACER v1.5 (https:// beast. commu nity/ tracer) 32 and burned in 10%-20% of each result. All parameters had an effective sample size of > 200 (Table S2). The log and tree files of each subgroup's resulting data were combined with LogCombiner v1.10.4 (https:// beast. commu nity/ logco mbiner). The combined log file was analyzed using TRACER 1.7.2 to reconstruct the effective population size over time that was estimated by the Gaussian Markov random field skyride model. A time-scaled maximum clade credibility tree was generated using TreeAnnotator v1.10.4 (https:// beast. commu nity/ treea nnota tor) in BEAST and visualized using FigTree 1.4.3 (http:// tree. bio. ed. ac. uk/ softw are/ figtr ee/), and the migration routes were visualized using SpreaD3 v0.9.6 (https:// beast. commu nity/ sprea d3) 33 .
The rate and number of transitions among the regions (Markov jumps) were estimated using stochastic mapping techniques implemented in the BEAST package 34 . Posterior trees were analyzed using the PACT program (http:// www. trevo rbedf ord. com/ pact) to compute the number of transition events between the regions through time. The posterior trees were broken into multiple temporal sections (one week per section). The migrations of the location state at the tree nodes were counted at each time window for each posterior tree.